Topic Detection in Tweets using Hashtags
By: Manzoor, Shazia.
Contributor(s): Siddiqi,Taha Hafeez.
Publisher: New Delhi STM Journals 2018Edition: Vol,5(2), May- Aug.Description: 6-14p.Subject(s): Computer EngineeringOnline resources: Click Here In: Recent trends in programming languagesSummary: Data clustering is a common technique used for topic detection and identification in text. In this paper we propose tokenizing of hashtags as an effective and efficient way of generating bags of words for tweets (text messages on Twitter). The document- term matrix generated using these terms is smaller and more relevant compared to the full text tokenization. Tweets from the 2018 Oscars event tagged with #oscars hashtag were collected and split into sets of 50000 tweets. Three strategies for tokenization of text and hashtags were tested using two popular data clustering algorithms, K-Means++ and Non-negative Matrix Factorization (NMF). The results were compared in terms of accuracy and performance. We concluded that tokenizing hashtags is a better alternative in terms of performance and achieves results equivalent to that of tokenizing tweet text.Item type | Current location | Call number | Status | Date due | Barcode | Item holds |
---|---|---|---|---|---|---|
Articles Abstract Database | School of Engineering & Technology Archieval Section | Not for loan | 2021-2021771 |
Data clustering is a common technique used for topic detection and identification in text. In this paper we propose tokenizing of hashtags as an effective and efficient way of generating bags of words for tweets (text messages on Twitter). The document- term matrix generated using these terms is smaller and more relevant compared to the full text tokenization. Tweets from the 2018 Oscars event tagged with #oscars hashtag were collected and split into sets of 50000 tweets. Three strategies for tokenization of text and hashtags were tested using two popular data clustering algorithms, K-Means++ and Non-negative Matrix Factorization (NMF). The results were compared in terms of accuracy and performance. We concluded that tokenizing hashtags is a better alternative in terms of performance and achieves results equivalent to that of tokenizing tweet text.
There are no comments for this item.